Letter-Value Box Plots: Adjusting Box Plots for Large Data Sets
نویسندگان
چکیده
Conventional boxplots (Tukey 1977) are useful displays for conveying rough information about the central 50% of the data and the extent of the data. For moderate-sized data sets (n < 1000), detailed estimates of tail behavior beyond the quartiles may not be trustworthy, so the information provided by boxplots is appropriately somewhat vague beyond the quartiles, and the expected number of “outliers” and “far-out” values for a Gaussian sample of size n is often less than 10 (Hoaglin, Iglewicz, and Tukey 1986). Large data sets (n ≈ 10, 000 − 100, 000) afford more precise estimates of quantiles in the tails beyond the quartiles and also can be expected to present a large number of “outliers” (about 0.4 + 0.007n). The letter-value box plot addresses both these shortcomings: it conveys more detailed information in the tails using letter values, only out to the depths where the letter values are reliable estimates of their corresponding quantiles (corresponding to tail areas of roughly 2−i); “outliers” are defined as a function of the most extreme letter value shown. All aspects shown on the letter-value boxplot are actual observations, thus remaining faithful to the principles that governed Tukey’s original boxplot. We illustrate the letter-value boxplot with some actual examples that demonstrate their usefulness, particularly for large data sets.
منابع مشابه
Workplace statistical literacy for teachers: interpreting box plots
As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the appropriate knowledge and experience to interpret...
متن کاملShould Young Students Learn About Box Plots?
In this chapter, we explore the challenges of learning about box plots and question the rationale for introducing box plots to middle school students (up to 14 years old). Box plots are very valuable tools for data analysis and for those who know how to interpret them. Research has shown, however, that some of their features make them particularly difficult for young students to use in authenti...
متن کامل1 1 Box plots : use and interpretation
Abox-and-whisker plot, often referred to as a box plot, was developed by John Tukey.1 It is a convenient graphic tool in descriptive analysis to display a group or groups of numerical data through their medians, means, quartiles, and minimum and maximum observations. A box plot is useful to display the distribution of data, examine symmetry, and indicate potential outliers and can also be used ...
متن کاملInteractive XCMS Online: Simplifying Advanced Metabolomic Data Processing and Subsequent Statistical Analyses
XCMS Online (xcmsonline.scripps.edu) is a cloud-based informatic platform designed to process and visualize mass-spectrometry-based, untargeted metabolomic data. Initially, the platform was developed for two-group comparisons to match the independent, "control" versus "disease" experimental design. Here, we introduce an enhanced XCMS Online interface that enables users to perform dependent (pai...
متن کاملGeographic Box Plots
Traditional box plots, described by Tukey in 1977, can be biased representations of geographic or spatial variables, since they do not take into account the areas associated with the elements of geographic variables. To help solve this problem, we propose and evaluate a spatially (areally) weighted box plot, a “geographic box plot,” that can be used to describe a wide variety of geographic vari...
متن کامل